142 research outputs found

    Measuring Social Biases in Grounded Vision and Language Embeddings

    Full text link
    We generalize the notion of social biases from language embeddings to grounded vision and language embeddings. Biases are present in grounded embeddings, and indeed seem to be equally or more significant than for ungrounded embeddings. This is despite the fact that vision and language can suffer from different biases, which one might hope could attenuate the biases in both. Multiple ways exist to generalize metrics measuring bias in word embeddings to this new setting. We introduce the space of generalizations (Grounded-WEAT and Grounded-SEAT) and demonstrate that three generalizations answer different yet important questions about how biases, language, and vision interact. These metrics are used on a new dataset, the first for grounded bias, created by augmenting extending standard linguistic bias benchmarks with 10,228 images from COCO, Conceptual Captions, and Google Images. Dataset construction is challenging because vision datasets are themselves very biased. The presence of these biases in systems will begin to have real-world consequences as they are deployed, making carefully measuring bias and then mitigating it critical to building a fair society

    Learning a natural-language to LTL executable semantic parser for grounded robotics

    Full text link
    Children acquire their native language with apparent ease by observing how language is used in context and attempting to use it themselves. They do so without laborious annotations, negative examples, or even direct corrections. We take a step toward robots that can do the same by training a grounded semantic parser, which discovers latent linguistic representations that can be used for the execution of natural-language commands. In particular, we focus on the difficult domain of commands with a temporal aspect, whose semantics we capture with Linear Temporal Logic, LTL. Our parser is trained with pairs of sentences and executions as well as an executor. At training time, the parser hypothesizes a meaning representation for the input as a formula in LTL. Three competing pressures allow the parser to discover meaning from language. First, any hypothesized meaning for a sentence must be permissive enough to reflect all the annotated execution trajectories. Second, the executor -- a pretrained end-to-end LTL planner -- must find that the observe trajectories are likely executions of the meaning. Finally, a generator, which reconstructs the original input, encourages the model to find representations that conserve knowledge about the command. Together these ensure that the meaning is neither too general nor too specific. Our model generalizes well, being able to parse and execute both machine-generated and human-generated commands, with near-equal accuracy, despite the fact that the human-generated sentences are much more varied and complex with an open lexicon. The approach presented here is not specific to LTL: it can be applied to any domain where sentence meanings can be hypothesized and an executor can verify these meanings, thus opening the door to many applications for robotic agents.Comment: 10 pages, 2 figures, Accepted in Conference on Robot Learning (CoRL) 202

    DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity

    Full text link
    The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases. In this work, we introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world. Our indicators complement qualitative analysis of the broader impact of such systems by enabling automatic and efficient benchmarking of geographic disparities, an important step towards building responsible visual content creation systems. We use our proposed indicators to analyze potential geographic biases in state-of-the-art visual content creation systems and find that: (1) models have less realism and diversity of generations when prompting for Africa and West Asia than Europe, (2) prompting with geographic information comes at a cost to prompt-consistency and diversity of generated images, and (3) models exhibit more region-level disparities for some objects than others. Perhaps most interestingly, our indicators suggest that progress in image generation quality has come at the cost of real-world geographic representation. Our comprehensive evaluation constitutes a crucial step towards ensuring a positive experience of visual content creation for everyone

    FACET: Fairness in Computer Vision Evaluation Benchmark

    Full text link
    Computer vision models have known performance disparities across attributes such as gender and skin tone. This means during tasks such as classification and detection, model performance differs for certain classes based on the demographics of the people in the image. These disparities have been shown to exist, but until now there has not been a unified approach to measure these differences for common use-cases of computer vision models. We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large, publicly available evaluation set of 32k images for some of the most common vision tasks - image classification, object detection and segmentation. For every image in FACET, we hired expert reviewers to manually annotate person-related attributes such as perceived skin tone and hair type, manually draw bounding boxes and label fine-grained person-related classes such as disk jockey or guitarist. In addition, we use FACET to benchmark state-of-the-art vision models and present a deeper understanding of potential performance disparities and challenges across sensitive demographic attributes. With the exhaustive annotations collected, we probe models using single demographics attributes as well as multiple attributes using an intersectional approach (e.g. hair color and perceived skin tone). Our results show that classification, detection, segmentation, and visual grounding models exhibit performance disparities across demographic attributes and intersections of attributes. These harms suggest that not all people represented in datasets receive fair and equitable treatment in these vision tasks. We hope current and future results using our benchmark will contribute to fairer, more robust vision models. FACET is available publicly at https://facet.metademolab.com

    Study of Natural Health Product Adverse Reactions (SONAR): Active Surveillance of Adverse Events Following Concurrent Natural Health product and Prescription Drug Use in Community Pharmacies

    Get PDF
    Background: Many consumers use natural health products (NHPs) concurrently with prescription medications. As NHP-related harms are under-reported through passive surveillance, the safety of concurrent NHP-drug use remains unknown. To conduct active surveillance in participating community pharmacies to identify adverse events related to concurrent NHP-prescription drug use. Methodology/Principal Findings: Participating pharmacists asked individuals collecting prescription medications about (i) concurrent NHP/drug use in the previous three months and (ii) experiences of adverse events. If an adverse event was identified and if the patient provided written consent, a research pharmacist conducted a guided telephone interview to gather additional information after obtaining additional verbal consent and documenting so within the interview form. Over a total of 112 pharmacy weeks, 2615 patients were screened, of which 1037 (39.7%; 95% CI: 37.8% to 41.5%) reported concurrent NHP and prescription medication use. A total of 77 patients reported a possible AE (2.94%; 95% CI: 2.4% to 3.7%), which represents 7.4% of those using NHPs and prescription medications concurrently (95%CI: 6.0% to 9.2%). Of 15 patients available for an interview, 4 (26.7%: 95% CI: 4.3% to 49.0%) reported an AE that was determined to be “probably” due to NHP use. Conclusions/Significance: Active surveillance markedly improves identification and reporting of adverse events associated with concurrent NHP-drug use. Although not without challenges, active surveillance is feasible and can generate adverse event data of sufficient quality to allow for meaningful adjudication to assess potential harms

    Family composition and age at menarche: findings from the international Health Behaviour in School-Aged Children Study

    Get PDF
    This research was funded by The University of St Andrews and NHS Health Scotland.Background Early menarche has been associated with father absence, stepfather presence and adverse health consequences in later life. This article assesses the association of different family compositions with the age at menarche. Pathways are explored which may explain any association between family characteristics and pubertal timing. Methods Cross-sectional, international data on the age at menarche, family structure and covariates (age, psychosomatic complaints, media consumption, physical activity) were collected from the 2009–2010 Health Behaviour in School-aged Children (HBSC) survey. The sample focuses on 15-year old girls comprising 36,175 individuals across 40 countries in Europe and North America (N = 21,075 for age at menarche). The study examined the association of different family characteristics with age at menarche. Regression and path analyses were applied incorporating multilevel techniques to adjust for the nested nature of data within countries. Results Living with mother (Cohen’s d = .12), father (d = .08), brothers (d = .04) and sisters (d = .06) are independently associated with later age at menarche. Living in a foster home (d = −.16), with ‘someone else’ (d = −.11), stepmother (d = −.10) or stepfather (d = −.06) was associated with earlier menarche. Path models show that up to 89% of these effects can be explained through lifestyle and psychological variables. Conclusions Earlier menarche is reported amongst those with living conditions other than a family consisting of two biological parents. This can partly be explained by girls’ higher Body Mass Index in these families which is a biological determinant of early menarche. Lower physical activity and elevated psychosomatic complaints were also more often found in girls in these family environments.Publisher PDFPeer reviewe

    Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

    Full text link
    We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs. Extensive experiments demonstrate that this recipe is highly effective for multi-modal models. CM3Leon achieves state-of-the-art performance in text-to-image generation with 5x less training compute than comparable methods (zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate unprecedented levels of controllability in tasks ranging from language-guided image editing to image-controlled generation and segmentation

    Trends in the perceived body size of adolescent males and females in Scotland, 1990–2014: changing associations with mental well-being

    Get PDF
    Objectives: This paper explores trends in Scottish adolescents’ body size perceptions and associated mental well-being outcomes. Methods: Data were collected on Scottish 11, 13 and 15-year olds by the Health Behaviour in School-aged Children study between 1990 and 2014 (n=42,312). Logistic regression was used to examine changes in the prevalence of over- and underweight perceptions. Ordinal and linear regression was used to examine changes in the association between body perception and mental well-being. Results: Little change was observed in over- or under-weight perceptions between 1990 and 2014. However, relative to those perceiving their body as ‘about right’, those perceiving themselves as overweight reported decreasing confidence (all groups), decreasing happiness (11- and 13-year old girls) and increasing psychological symptoms (all girls and 15 year-old boys). Perceived underweight is associated with poor well-being, especially in males, but we present little evidence that this is a recent phenomenon. Conclusions: We present evidence suggesting that the influence of body image on adolescent mental health is increasing over time. This may play a role in the recently observed worsening of mental well-being in Scottish adolescents.Publisher PDFPeer reviewe
    corecore